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Abstract 

In the context of the hype, promise and perils of Big Data and the currently dominant paradigm of 
data-driven decision-making, it is important to critically engage with the potential of Big Data for 
higher education. We do not question the potential of Big Data, but we do raise a number of 
issues, and present a number of theses to be seriously considered in realising this potential. 

The University of South Africa (Unisa) is one of the mega ODL institutions in the world with more 
than 360,000 students and a range of courses and programmes. Unisa already has access to a 
staggering amount of student data, hosted in disparate sources, and governed by different 
processes. As the university moves to mainstreaming online learning, the amount of and need for 
analyses of data are increasing, raising important questions regarding our assumptions, 
understanding, data sources, systems and processes. 

This article presents a descriptive case study of the current state of student data at Unisa, as well 
as explores the impact of existing data sources and analytic approaches. From the analysis it is 
clear that in order for big(ger) data to be better data, a number of issues need to be addressed. 
The article concludes by presenting a number of theses that should form the basis for the 
imperative to optimise the harvesting, analysis and use of student data. 
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Introduction 

Technology is neither good nor bad; nor is it neutral...technology’s 
interaction with the social ecology is such that technical 
developments frequently have environmental, social, and human 
consequences that go far beyond the immediate purposes of the 
technical devices and practices themselves. 

Melvin Kranzberg (1986, p. 545) 

The view that Big Data is neither inherently good nor bad, increasingly finds its way into the 
current discourses in higher education. There is therefore a need for critical scholarly engagement 
amidst claims that “Big Data represents a paradigm shift in the ways we understand and study 
our world” (Eynon, 2013, p. 237). Much of the discourses regarding Big Data in higher education 
focus on increasing efficiency and cost-effectiveness (Altbach, Reisberg & Rumbley, 2009), 
amidst concerns regarding privacy, surveillance, the nature of evidence in education, and so forth 
(Biesta, 2007, 2010; Eynon, 2013; Prinsloo & Slade, 2013; Wagner & Ice, 2012). While Big Data 
has been lauded as “chang(ing) everything” (Wagner & Ice, 2012), and “revolutionis(ing) 
learning” (Van Rijmenam, 2013), boyd and Crawford (2012, 2013) question, quite rightfully, 
whether Big Data is an unqualified good. 

The potential, perils, and the harvesting and analysis of Big Data in general, and in higher 
education in particular are, therefore, still relatively unexplored (Lyon, 2014; Siemens & Long, 

2011) . There is a need to establish a common language (Van Barneveld, Arnold, & Campbell, 

2012) as well as resolve a number of conceptual, ethical and practical matters (Andrejevic, 2014; 
boyd & Crawford, 2012; Bryant & Raja, 2014; Clow, 2013a, 2013b, 2013c; Morozov, 2013a, 2013b; 
Mayer-Schonberger, 2009; Mayer-Schonberger & Cukier, 2013; Puschmann & Burgess, 2014; 
Richards & King, 2013; Slade & Prinsloo, 2013; Siemens, 2013). 

While higher education institutions always had access to relatively large data sets and tools for 
analysis, there is an increasing amount of digital student data that can be harvested and analysed 
(Swain, 2013), as well as increased technological and analytical capabilities (Wishon & Rome, 
2012). Describing learning analytics as the “new black” (Booth, 2012), and student data as the 
“new oil” (Watters, 2013) may resemble “techno-solutionism” (Morozov, 2013b) and a certain 
“techno-romanticism” in education (Selwyn, 2014). Despite various claims regarding the success 
of learning analytics to improve student success and retention (e.g., Arnold, 2010; Clow 2013a, 
2013b, 2013c), Watters (2013) warns that “the claims about big data and education are incredibly 
bold, and as of yet, mostly unproven” (par. 17). Without ascribing to a technological pessimism, 
we have to critically explore the current belief in higher education that harvesting and analysing 
increasingly bigger data sets, will, necessarily improve student success and retention (boyd & 
Crawford, 2012; Briggs, 2014; Epling, Timmons & Wharrad, 2003). 
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This article does not question the potential of big(ger) data to be better data but would rather 
present a number of theses that may ensure that big(ger) data will result in a more thorough 
understanding of the complexities of student success and retention, and more appropriate, 
timeous and effective institutional support. 


Method and Context 

This descriptive and interpretive case study explores the proposition of big(ger) data within the 
context of a mega open, distance learning institution (ODL), the University of South Africa 
(Unisa). With more than 360,000 students registered for a variety of qualifications (more than 
1,200) and selecting from a range of close to 3,000 courses- the size of available data may already 
qualify as ‘big, ’ though we acknowledge that compared to definitions of Big Data (e.g., Kitchen, 
2013) the data harvested, analysed and used by Unisa currently falls short. 

As a descriptive case study, the issues and theses raised do not claim to be original, nor 
generalisable to other higher education institutions, but rather linked to a number of theoretical 
propositions or theses (Yin, 2009). The value of descriptive case studies lies in its potential to 
extrapolate from a specific phenomenon a number of abstract interpretations of data and 
propositions for theoretical development (Mills, Eurepos & Wiebe, 2010). As an interpretive case 
study we attempt to not only describe a specific case of the use of student data, but interpret the 
case to move to a temporary conceptual framework, propositions or theses (Thomas, 2011). Our 
aim is therefore to advance phronesis or practical knowledge. 

As Unisa increasingly moves into digitised and online learning, the amount of data available, as 
well as increased analytical capability provides ample foundation for an intensification of data 
harvesting and analysis efforts. Some of the questions integral to our exploration relate to how 
optimally we use data that we already have access to and whether more and/or different data will 
support our endeavours in finding the Holy Grail of a unified theory of student engagement, 
retention and success. How do we distinguish between signals and noise (Silver, 2012)? We 
accept that harvesting more and different data may hold potential, but if we do not think critically 
about institutional support and operational integration with regard to the harvesting and analysis 
of data, we may never realise the potential of bigger data. 

All of the authors except one, who is a researcher in open distance learning, are involved in 
institutional research at Unisa, with the mandate to provide high quality, relevant and timely 
information, analysis and institutional research, making the institution more intelligible to itself. 
As such, the question regarding Big Data is central to the daily praxis of the researchers. 

The authors engaged with literature and the current status of student data at Unisa in a 
systematic way over the course of six months. Several meetings were held, and the analysis in this 
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article resulted from the field notes from these meetings. Construct validity was ensured by 
engaging with multiple sources of evidence. The team confirmed internal validity by explanation 
building and addressing rival explanations as suggested by Yin (2009). The resulting theses were 
compared with the literature and available evidence in the field. 

This paper employs the insights, experience and thoughts of these researchers engaging with the 
often undocumented realities of engaging with big(ger) data within a mega open, distance 
learning institution. The aim of the paper is to move beyond the discussion of whether or not 
big(ger) data is better data towards the more practical questions of: in order for big(ger) data to 
be better data, what needs to be in place? 


Big Data 

Big Data refer to “the capacity to search, aggregate and cross-reference large data sets” (boyd & 
Crawford, 2012, p. 663) and should be explored not only for its potential but also to question its 
capacities, its socio-political consequences and the need for critique (Lyon, 2014). “Big Data is, in 
many ways, a poor term” (boyd & Crawford, 2012, p. 1) and increasingly refers to metadata or 
“data about data” (Lyon, 2014, p. 3). Rob Kitchen (in Lyon, 2014, p. 5) describes Big Data as 
having the following characteristics: 

huge volume, consisting of terabytes or petabytes of data; high 
velocity, being created in or near real time; extensive variety, 
both structured and unstructured; exhaustive in scope, striving 
to capture entire populations of systems; fine-grained resolution, 
aiming at maximum detail, while being indexical in 
identification; relational with common fields that enable the 
conjoining of different data-sets; flexible, with traits of 
extensionality (easily adding new fields) and scalability (the 
potential to expand rapidly) 

While most of the current discourses emphasise the increasing amount of data, the ‘real’ value 
(and peril) in Big Data lies in its networked and relational nature (Bauman & Lyon, 2013; boyd & 
Crawford, 2012; Marwick, 2014; Mayer-Schonberger & Cukier, 2013; Solove, 2001) with “at least 
three significant actors in this drama, government agencies, private corporations and, albeit 
unwittingly, ordinary users” (Lyon, 2014, p. 3). “It is the kind of data that encourages the practice 
of apophenia: seeing patterns where none actually exist, simply because massive quantities of 
data can offer connections that radiate in all directions” (boyd & Crawford, 2012, p. 2). 

In the context of Big Data, there is talk of the “age of analytics” (Tene & Polonetsky, 2012, p.i), 
and increasingly, the “algorithmic turn”, “the algorithm as institution” (Napoli, 2013, p. 1), the 
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“threat of algocracy” (Danaher, 2014) and “algorithmic regulation” (Morozov, 2013a, par.15). In 
these instances, algorithms have regulative, normative, and cultural-cognitive dimensions in the 
intersection between algorithm and institution where code becomes law (Napoli, 2013, referring 
to Lessig, 2006). A number of authors interrogate the potential of Big Data through the lens of 
“societies of control” (Deleuze, 1992, p. 4) (also see Henman, 2004). Big Data and its algorithms 
resemble a possible “gnoseological turning point” in our understanding of knowledge, 
information and faculties of learning where bureaucracies increasingly aspire to transform and 
reduce “ontological entities, individuals, to standardized ones through formal classification” into 
algorithms and calculable processes (Totaro &Ninno, 2014, p. 29). 

A number of authors (boyd & Crawford, 2013; Lyon, 2014; Richards & King, 2013) therefore posit 
some provocations regarding Big Data that demand critical reflection. The increasing reliance on 
Big Data questions our traditional assumptions about knowledge in the context of Big Data’s 
claim to “objectivity and accuracy [which] are misleading.” We also need to realise that “not all 
data are equivalent” (boyd & Crawford, 2013, pp. 3-12). There is also the potential that Big Data 
will create new divides and be employed to perpetuate and increase existing inequalities and 
injustices (Andrejevic, 2014; boyd & Crawford, 2013; Couldry, 2014; Lyon, 2014; Richard & King, 
2013). 

There are also claims that Big Data “is about what, not why. We don’t always need to know the 
cause of the phenomenon; rather, we can let data speak for itself’ (Mayer-Schonberger & Cukier, 
2013, p. 14). And herein lays the dilemma. Data cannot speak for itself, boyd and Crawford 
(2012) refer to this assumption as “an arrogant undercurrent in many Big Data debates” (p. 4) 
and Gitelman (2013) states that raw data “is an oxymoron.” 

Amidst the hype and relative current scarcity of evidence regarding the impact of learning 
analytics on increasing the effectiveness of teaching and learning (Altbach, Reisberg & Rumbley, 
2009; Clow, 2013a; Watters, 2013), it is necessary to problematise the relationship between Big 
Data and education. Selwyn (2014) remarks that educational technology “needs to be understood 
as a knot of social, political, economic and cultural agendas that is riddled with complications, 
contradictions and conflicts” (p. 6). There are also concerns about education’s current 
preoccupation with evidence-based teaching and learning (Biesta, 2007, 2010). 

In the following section we therefore attempt to chart the relationship of learning analytics with 
academic analytics in the context of the bigger picture of data in education and Big Data. 

Learning Analytics, Academic Analytics and Big Data 

This article is not primarily concerned with the distinctions, scope and definitions of the different 
terminology used in the context of Big Data in higher education such as business, academic and 
learning analytics (see for example Buckingham Shum & Ferguson, 2012; Campbell, DeBlois, & 
Oblinger, 2007; Clow, 2013a; Siemens & Long, 2011; Van Barneveld et al., 2012). 
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We therefore adopt the joint use (not definition) of learning and academic analytics (as proposed 
by Siemens & Long, 2011). Learning analytics concerns itself with how students learn, integrating 
many aspects of students’ transactions with amongst others, the learner management system, 
while areas such as admission, course success, graduation, employment and citizenship are better 
encompassed under the broader definition of academic analytics (see Siemens & Long, 2011). The 
notion of Big Data incorporates all sorts of electronic interactions even beyond the higher 
education environment, incorporating different elements of students’ digital identities and 
personal learning environments (PLEs). Interestingly, as various sources of information start to 
resemble “electronic collages” and an “elaborate lattice of information networking” (Solove, 2004, 
p. 3), learning analytics may potentially overlap with academic analytics, the available data in the 
context of higher education, and data available ‘outside’ the strict boundaries of higher education 
(see Diagram 1). 

Diagram 1: Mapping Learning and Academic Analytics in the Context of Big Data 
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It is crucial to delimit the scope of this article’s focus on the potential of big(ger) data as better 
data as focusing only on the data available in, harvested and analysed by higher education 
institutions. This delimitation is, however, somewhat problematic, as the lattices of information 
networking (Solove, 2004) do not necessarily care about the (relatively) artificial boundaries 
between data ‘in’ higher education and data ‘outside’ of higher education. There is evidence that 
the nature of these boundaries may, actually, become increasingly artificial as student data 
becomes a potential income stream for institutions, and secondly, as student learning moves 
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beyond the strict confines of institutional learning management systems (LMS) (Prinsloo & Slade, 
2013) (also see Bennet, 2001; Giroux, 2014; Lyon, 2014; Marx & Muschert, 2007). 


An Overview of the Current State of Student Data at Unisa 

The current state of the harvesting and use of student data at Unisa should be understood in 
terms of the institution’s understanding of student retention and success (the student journey or 
walk), and secondly in the context of existing data sources and analytical approaches. The aim of 
this section is therefore not to offer immediate and possible solutions to some of the challenges 
experienced, but to contextualize and highlight the current state. 

Student Data and the Student Journey 

Unisa understands and predicts student retention, dropout and student success based on a socio- 
critical understanding of student success as described by Subotzky and Prinsloo (2011). This 
model draws attention to five student data elements, which can be harvested to support 
institutional planning and decision-making and to determine possible student support 
interventions. These include admission, learning activity, course success, graduation, and 
employment and citizenship. Data, which pertains to learning activities is perhaps the one area 
where learning analytics finds it home given that it is concerned with how students learn. The 
other areas (admission, course success, graduation, and employment and citizenship) are better 
encompassed under the broader definition of academic analytics (see Siemens & Long, 2011). 

Admissions and Registrations 

A variety of demographic data is captured at registration about the student, including attributes 
such as gender, ethnicity, age and educational background, which can be used by researchers to 
create student profiles linked to success indicators such as exam success and graduation, as well 
as learning analytics data, which are available on the learning management system (LMS). The 
integrity of the data, is of course something that needs to be regularly monitored. There are 
important profiling elements, which are not currently available in the databases, such as living 
standard measures and access to ICTs - these are instead sourced through research surveys 
involving samples of students and not the entire student population. 

Learning Activities 

This aspect of the student journey includes, but is not limited to, downloading study materials 
and learning objects, preparing and submitting assignments, engaging with e-tutors and e- 
mentors, engaging with a range of learning facilitators via the LMS through discussions groups 
and blogs, uploading assignments as they prepare for examinations and digitised learning activity 
indicators such as tutorial attendance. The Unisa LMS is a key data source, which can be used for 
harvesting and making sense of student data, and discussions are underway to make relevant data 
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available for research and analysis purposes. It is important to note that while most of the above 
data can be made available, the data currently reside in different systems, which are not 
integrated. This presents a challenge and lengthens the time of gaining access to relevant data. 

Considering the fact that much of students’ learning and learning activities may actually take 
place outside of the institutional LMS in digital and non-digital personal learning networks 
(PLNs) or personal learning environments (PLEs), this poses some additional challenges to the 
harvesting and analysis of data, on a number of levels (e.g., interoperability, appropriateness of 
data, etc.). Slade and Prinsloo (2013) also recognised the potential loss of gaining a “holistic 
picture of students’ lifeworlds” (p. 1515) due to the “inability to track activity outside of an 
institution” (also see Prinsloo & Slade, 2014). 

Course Success 

Course success (as measured by passing the examination) remains but one element of measured 
success (Subotzky & Prinsloo, 2011). The examination process is closely monitored and reported 
on, covering aspects of exam admission, exam absence, and success in terms of number of 
students that passed relative to the number that sat for the examination. Additional information 
is also available on deferred examination statistics. All these data provide a comprehensive view 
of the examination event. These data can also be extended to compute other cohort aspects of 
success including examination passes relative to enrolments and intake. A number of reporting 
repositories are available to academics for the dissemination of examination statistics. 

Recent engagement in this area suggests that attrition during the examination event is relatively 
small and focused attention is now moving to retention prior to the examination. The 
combination of academic and real-time learning analytics will allow us to get a sense of students’ 
losing or changing momentum as they progress, whether by analysing their engagement in online 
forums, and/or submission of assignments. 

Retention 

At present, most of the analysis on retention is focused on various forms of student cohort 
analyses that are conducted and reported on at various decision making forums. While cohort 
retention and throughput information is readily available, the issue of the level of granularity of 
the data is under question and automated ‘drill through’ reports are not yet in place. These 
analyses remain plagued by data structure issues and the lack of a nuanced definition of ‘drop¬ 
out’. A pertinent issue is that student dropout is only picked up when students fail to submit 
assignments, sit for the examination or re-register. Currently there is no mechanism in place to 
record, analyse and respond to real-time data, for example, where students do not log onto the 
LMS in more than a week, which would then enable early identification of students at risk of 
drop-out and the ability to provide personalised support and feedback. (For a critical appraisal of 
the use of learning analytics to increase student retention, see Clow, 2013b.) 
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Graduation 

Analysis of graduation data provides another lens and level of granularity on student retention. 
Graduation data are available for analysis and reporting purposes to a range of stakeholders. 
However, this is a complex process as graduation data can be reported from a number of different 
perspectives, which are mutually exclusive and result in different outcome indicators. 

Employment and Citizenship 

Though the Alumni database at the institution stores data on graduates and their contact details, 
it is almost impossible to keep this database current. As an important data source, an up-to-date 
and information-rich Alumni database may provide crucial data for analyses such as measuring 
the impact of qualifications on graduate employment trajectories, as well as provide opportunities 
for post-enrollment guidance and lifelong learning. 


Engaging with the Student Data Elements 

Existing Data Sources and Analytical Approaches 

Existing data sources. 

The existing data sources lie in a number of disparate operational systems maintained by various 
functional units within the institution. Broadly, the roles of Data Owners, Data Custodians and 
Data Stewards have been identified and communicated, however, the need for cross-functional 
data access makes operating within these specific roles problematic. A far greater level of systems 
integration needs to be achieved if these roles are to be meaningfully applied. The disparate 
systems typically have grown from the need of different functional areas to have customised 
functionality built into operational systems. These developments take place without a reporting or 
analytic objective in mind with the result that levering sensible analytics from these sources is a 
problem. 

The disparate nature of systems at Unisa has not embraced the concept of central warehousing, 
with data stores remaining fragmented into the major areas of (1) student administration, (2) 
learner management, (3) human resources, (4) finance and (5) space and the built environment. 
Not only are these areas supported by different operational systems, they have produced separate 
data warehouses and even separate business intelligence frameworks. While this is not a problem 
for operational needs, it remains a huge problem for integrated analysis and reporting, which 
requires the integration of these data sets. 

Considering the velocity, agility and complexity of Big Data in the context of disparate existing 
data sources, it is clear that in order for big(ger) data to be better data, Unisa will have to rethink 
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the governance and integration of these different sources. Not only does Big Data increasingly 
allow the same data to be used for different purposes (Lyon, 2014), but with each recursion, there 
is an increasing threat of the loss of original context. With each new configuration and 
combination the data may “be given new meanings that are cut off from the values that once made 
sense of them and the identifiable subjects whose activities generated them in the first place” 
(Lyon, 2014, p. 6) (also see Amoore, 2011). 

Analytical Approaches 

The disparate nature of the various data sources typically requires technically skilled staff within 
the institution to be commissioned to translate the ‘business rules’ of the organisation into 
extracts that afford data migrated from an operational system towards an analytic framework. 
Data extracts vary in complexity depending on the alignment of business rules to analytic 
opportunities. In many cases, time-based analyses of the raw data are not possible as there are no 
date stamps on certain areas of the data. For example, the need to analyse the academic drift in 
the offerings of the institution requires the Programme Qualification Mix (PQM) or academic 
structure to be stored in a way that differentiates time (per year). In Unisa’s case, the academic 
structure can be overwritten each year with the new (or combinations of) offerings with the result 
that the history of the data set is lost. The warehousing project will then have to take snapshots at 
designated points in the year to store this information in order to conduct the analytics. 

Once on the warehouse a number of ‘views’ of the data are required primarily to address the 
performance of analytics and are set up to constitute a ‘reporting layer’. Since the core business of 
the institution is undergraduate teaching and learning, more emphasis is placed on the various 
student views. The term used here does not refer to a specific table on a database but rather a 
design feature that can be applied to any technical environment. Typically, views are determined 
by a particular granularity (level of detail) of the data, or could also be designed around a 
common ‘currency’ in the data. For example, a course view has a significantly different granularity 
of student information compared with a programme or qualification view. Currency would refer 
to the underlying common element linking aspects of the analytics, for example the course could 
form a common currency for assignment information, tutor information and student information. 

Broadly speaking, student views lie separately in the areas of assignments, courses, qualifications, 
tutors and lecturers. All metrics are governed by the structure within which they reside. Courses 
and qualifications reside within the academic structure or curriculum, but are also closely 
associated with the organisational structure at the time. Changes in these areas over time need to 
be addressed in an automated way in order to retain the integrity of analytics. Challenges in these 
areas emanate when the objectives of analytics are not clearly defined as an institutional take on 
reporting requirements. A good example of this would be the creation or dissolution of a faculty, 
school or college. These major changes occur relatively frequently in the current higher education 
environment and have major impacts on the intelligence of the institution. An institutional 
decision is required where reporting is either done employing the current structure or not. Many 
institutions choose to link all dependencies to the current structure and to transpose the history 
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in the data to that structure. In the case of our current example, if a college is created in a 
particular year, the history of all the related metrics (enrolments, exam statistics, assignments, 
etc.) are linked to the college and reported as if it were in existence in history. The question which 
needs to be considered is ‘at what point is historical data irrelevant?’ 

The design of the reporting layer takes cognisance of the vast variety of analytics to be performed, 
from automated summary tables that underpin aggregated dashboards to the need for extended 
data mining of all attributes in these views to pick up underlying trends. The main purposes for 
the creation of the reporting layer is to support the user requirements of analysts and researchers, 
this being different from Management Information referred to above. Data sets are provided to 
these users either via direct access to the warehouse environment (SQL Server database), by 
provisioning front-end tools linked to these sources (MS Excel pivots, IBM SPSS, etc.) and also 
via automated web applications (Business Intelligence tools, customised net applications, 
proprietary tools, etc.). 


Ensuring Big(Ger) Data as Better Data: A Number of 

Theses 

Having described the current state of student data in a descriptive case study, we now propose a 
number of theoretical theses to inform further research and policy development. The theses are 
not mutually exclusive and the sequencing of the theses indicate a relative weighting in 
importance in order to address the research focus of this article, namely: In order for big(ger) 
data to be better data, what are the issues that we need to consider? 

Thesis 1: A Critical View of Big Data 

Amidst the hype and promise of Big Data in higher education, it is crucial to critically question 
and engage not only with its potential and its perils, but also with our assumptions about student 
engagement, retention and success. In order for big(ger) data to be better data in the context of 
Unisa, it is clear that a skeptical (Selwyn, 2014) and most probably a disenchanted view of Big 
Data’s potential and promise is necessary (Couldry, 2014) (also see boyd & Crawford, 2012, 2013; 
Lyon, 2014 and Richards & King, 2013). In following Gitelman (2013) we believe that raw data “is 
an oxymoron”, and contrary to Mayer-Schonberger and Cukier (2013) we do not believe that data 
speaks for itself. We need to consider ‘what’ and the ‘why’ (also see Kitchen, 2013). Considering 
that data is increasingly used to determine access to support and resources (Henman, 2004), we 
need to be cognisant of the impact and unintended consequences of our assumptions 
underpinning the algorithmic turn in higher education. Being skeptical about the potential of 
more data to be necessarily better data, we also need to critically engage with why much of the 
data harvested and analysed are not used (Macfadyen & Dawson, 2012). 
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Not only does such a critical and even skeptical view of big(ger) data herald and necessitate the 
need for a paradigm shift, such an approach instigates a number of other paradigm shifts with 
regard to institutional processes, skills-sets, systems and institutional knowledge. 

Thesis 2: Process Changes 

A combination of systems is used at different levels to harvest the data, but first let us determine 
the sources. The main source and management of data occur at the enterprise system level 
(Oracle student, Oracle HR, LMS, XMO, etc.). At this level, the data entry points are established 
and processes are put in place to capture and manage data input. It is at this level where 
boundaries and roles need to be clearly delineated to ensure the correct access is given to those 
that need update rights on the data. In many cases, we have staff with access to update data 
elements that are not specific to their environment and operational needs. 

From these systems, a very limited quantity of Management Information (MI) is provided, mainly 
because these systems are in-house developments, which were not designed with a reporting 
perspective in mind. We make a distinction between MI and analytics in this article and submit 
that any respectable proprietary or in-house developed system should cater for all the needs of the 
user to perform the operational functions of their respective departments. As we move towards 
embracing analytics and the possible benefits of big(ger) data as better data, a paradigm shift will 
be required as it is this very operational and transactional data that will provide the basis for real 
time analytics. 

The process of harvesting is important as during this process attention is given to the quality of 
the data. Procedures in place rely on a series of ‘exception’ or ‘error’ reports being run to 
determine data quality issues. These processes also allow for feedback to the source and 
responsible person to address data quality issues. This process is an iterative one of checking, 
feedback, fixing and re-extracting the data before release to users in the reporting layer. 

Thesis 3: Skills and Capabilities Shift 

The harvesting and analysis of available data by institutional researchers within Unisa is 
imperative as a contributor, firstly, towards understanding student success and secondly, to 
support strategic and operational decision making in order to ensure institutional sustainability 
and growth. However, where institutional researchers have dwelled within a paradigm of more 
traditional research designs and methods, Big Data offer significant new challenges from a data 
harvesting and mining perspective. 

The availability of such Big Data is, however, still only in its infancy at Unisa. In order for Unisa 
researchers to harvest and analyse such data in future to create valuable and actionable insights, 
a new range of skills and competencies will be required. Traditional research methods often focus 
on data generation due to a scarcity or lack of data. This required researchers to focus and 
develop their skillsets around areas such as the science of measurement (are we measuring what 
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we are supposed to measure), data collection (to ensure representativeness) and analysis (can we 
make inferences about the larger population). 

The promise of Big Data offers a new world and requires a new skillset for analysts. Big data 
emanate from multiple sources, but require some manipulations to answer specific research 
questions. Researchers and stakeholders must now scope information needs to more exact 
questions. But with more data comes more noise and also missing values (i.e., gaps in data). The 
‘new’ institutional researcher must have skills to find or direct others to meaningful big data and 
make relevant connections. Programming skills in new languages combined with multi-source 
data mining, statistical modelling and prediction are now required. While the development of 
algorithms will be a part of this process, from an institutional and pedagogical perspective, an 
understanding of what drives student learning and success will remain key. Institutional 
researchers must balance the “what” provided by the patterns in data with the “why” which 
require more in-depth investigation through traditional research approaches. Identifying 
correlations alone will therefore not be sufficient. See Table 1 for a contrasting view between 
traditional approaches to institutional research and data analyses and modern analytics. 
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Table 1 

Shifts from Traditional Institutional Research Skills to the Demands of Modern Analytics 



More Traditional Institutional 
Research 

Modern Analytics 

Data 

• Scarcity of data 

• Big Data 


• Focus on data 

• Multiple data 


generation 

sources 


• Generally historical, 

• Generally trend and 


trend or snap shot 
orientation 

real-time data. 

Skillset emphasis 

• Science of 

• Finding meaning in 


measurement including 

Big Data and making 


instrument design and 

relevant connections 


construct measurement 

• Programming skills 


• Data collection to 

• Multi-source data 


ensure representativeness 

mining 


and generalisations 

• Statistical 


• Analysis for inference 

modelling 


about the larger population 

• Development of 
algorithms 

Approach 

• Multitude of possible 

• Big Data, not 


questions narrowed down 

tailored to any 


to more specific research 

questions, narrowed 


questions 

down to information 
needs for specific 
questions 

Main complexities 

• Limited granularity 

• More data = more 


and ability to segment due 

noise, difficulty to 


to small numbers 

determine which data 


• Representativeness of 

is meaningful and what 


sample and ability to make 

the patterns are 


inferences about the larger 

reflecting 


population 

• Missing values 

• Data Quality 

Driver 

• Understanding of what drives student learning and 
success 


Will the future institutional researcher be one that is described as statistician, mathematician, 
computer scientist, database administrator, coder, hardware guru, systems administrator, 
researcher and interrogator, all in one? Or will the individual make way for a more team based 
approach? A literature search points to a lack of research investigating the demands of Big Data 
on the skills and capabilities of institutional researchers. The convergence of traditional 
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institutional research skills, data science, analytical services and organisation intelligence 
therefore becomes a key area of consideration that should be taken cognisance of. 

Thesis 4: Systems Evolution 

The move toward central warehousing is imperative if the potential of Big Data and enterprise¬ 
wide analytics is to be achieved. A systematic mapping of the data elements needed for reporting 
and analysis is required, this in conjunction with an overview approach to systems architecture. 
Very little opportunity is taken to look at reporting and analytic requirements from an 
institutional perspective, which is exacerbated by the lack of clear roles and responsibilities, poor 
communication between affected parties and ‘turf wars’. 

Unisa should move towards the deployment of an ‘information control centre’, which has 
analytics as the primary objective, and which is built from a solid base of technical hardware, 
software and skilled personnel. The use of sophisticated software to bridge the gaps and 
boundaries of storage and location of data needs to be investigated. The movement away from 
traditional development thinking needs to happen in order to be agile in this area. We remain 
trapped in the mindset that requires a particular process to be followed: (1) data elements are 
identified for a particular purpose, (2) the metrics and measures for the element then defined, (3) 
the specifications are written to enhance a particular system for the purposes of storage, (4) 
development takes place to enhance the system, (5) the implementation of the upgrade is 
commissioned through extensive, often invaluable consultation, and (6) the process of capture 
and monitoring is documented and applied. 

Alignment of processes and systems will allow Unisa to not only use historical data in more 
effective (and appropriate) ways, but ensure an agile information architecture to optimise the 
potential and guard against the perils of Big Data. 

Thesis 5: Institutional Relevance, Context, and Knowledge 

With the possibilities of working with Big Data, traditional research methods will have to evolve 
to face this new reality of Big Data. The challenge lies also not only with the technical aspects of 
finding, organising and combining the often unstructured data, but with the contextual insights 
needed to interpret and apply the knowledge and intelligence gained. Contextual intelligence 
speaks to an in-depth understanding of the institution and how it functions, and will include 
knowledge of its operational and strategic objectives and direction. Herein lays one of the major 
challenges with bigger data. All this will require new ways of thinking on various levels. 
Importantly, researchers and analysts will need to balance the promise of Big Data, and the 
various opportunities it presents to uncover patterns in data and employ advanced analytics (the 
“what”), with an ongoing search for the drivers of student learning, success, retention, dropout 
and throughput (the “why”) in order to provide relevant intelligence with maximum impact. A 
stark reality within the Unisa context, which also faces institutional researchers globally, is the 
pertinent question of “So what”? If all the analytics and research do not result in actionable 
interventions by the university, in our case to strengthen support to students towards success, 
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then how relevant are we? This is a sobering thought, which also speaks to the tension between 
the provisioning of intelligence and influencing action. 


Limitations to this Study 

We acknowledge the concerns regarding case study methodology of a lack of rigor, and the 
impossibility to generalise from a single case (Yin, 2009). The value this case study adds is an 
attempt to generalise to “theoretical propositions, and not to populations or universes” (Yin, 
2009, p. 15). The theses proposed in the preceding section therefore could be used as a heuristic 
framework for engaging with the state and use of student data in other higher education contexts. 


(In)Conclusion 

Despite and amidst the fact that “big data is all the rage” (Richards & King, 2013, p. 41) and 
various claims that Big Data will revolutionise and transform the way we live, work, and think 
(Mayer-Schonberger & Cukier, 2013), there are also many authors who take a more sceptical and 
critical approach to Big Data (Amoore, 2011; boyd & Crawford, 2012, 2013; Bryant & Raja, 2014; 
Couldry, 2014; Lyon, 2014; Richards & King, 2013). 

In the current context of persisting concerns about student success and retention in higher 
education and in particular ODL (e.g., Subotzky & Prinsloo, 2011) and the prevailing logic of 
evidence-based decision-making and the “algorithmic turn” (Napoli, 2013, p. 1) and “algorithmic 
regulation” (Morozov, 2013a, par.15) in higher education, a critical interrogation of the potential 
of Big Data is called for. 

This article did not attempt to question the potential of Big Data for higher education but rather 
raised the question: “In order for big(ger) data to be better data, and to result in more effective 
and appropriate teaching, learning and support, what are the issues that we need to consider?” In 
the context of Unisa as a mega ODL institution, this question necessitated a sober, and somewhat 
sceptical (if not disenchanted), view of the practical implications to realise big(ger) data’s 
potential. We concluded the article with proposing five theses that, when considered, can 
increase the realisation of Big Data’s potential in a specific context, that of Unisa. 

Though some of the issues may be more applicable to higher and distance education contexts in 
developing world contexts, we propose the concluding theses to inform further research, 
contemplation and consideration. 
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